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Abstract 

We have extended Maruyama's [5, 6, 7] constraint dependency 
grammar (CDG) to process a lattice or graph of sentence hy- 
potheses instead of separate text strings. A post-processor to 
a sp':..s.^i recognizer producing N-best hypotheses generates the 
word graph representation, which is then augmented with infor- 
mation required for parsing. We will summarize the CDG pars- 
ing algorithm and then describe how the algorithm is extended 
to process a word graph on a single processor machine. 



1 Introduction 

The most successful of the current speech recognition systems 
which process continuous speech for a limited (1000 word) vo- 
cabulary are those which utilize hidden Markov models (HMM). 
Most systems utilizing this approach (e.g., [4, lOj)) have reduced 
recognition errors by incorporating some language information 
(syntactic and semantic) directly into the HMM to reduce per- 
plexity, but since the goal of these systems is recognition, not 
understanding, no structural analysis of the utterance is con- 
structed. Instead, the output of such systems is an ordered list 
of the N most likely sentence hypotheses (where N is a constant 
usually less than 100) [9, 11]. If understanding becomes the goal, 
such systems must pass the sentence hypotheses through a nat- 
ural language parser as a first step toward producing meaning 
representations. A context-free grammar (CFG) parser v/ould re- 
quire O(n^) time to process each sentence hypothesis containing 
n words. 

Processing each sentence hypothesis individually is inefficient 
since the sentence hypotheses often differ only slightly from each 
other. Furthermore, a list of sentence hypotheses is not the most 
compact representation to provide a natural language parser. A 
bettr "Presentation for the sentence hypotheses is a word graph 
or lattice of word candidates which contains information on the 
approximate beginning and end point of each word's utterance 
to temporally relate the word candidates. We have conducted 
a simple experiment which demonstrates the compactness of a 
word graph. For this experiment, we selected three sets of N- 
best sentence hypotheses for three different types of utterances: 
a command, a yes-no question, and a wh-question. The list of 
the N-best sentences was converted to a word graph in which the 
duration of the node was determined by maintaining a syllable 
count through the utterance. The size of the constructed word 
graphs is compared with the number of words in the lists of N- 
. best sentences (Ss) in table 1. The word graphs provided an 83% 
reduction in storage. 



Sentence 
Type 


Number 
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Number 
Words 


Distinct 
Words 


Number 
Nodes 


Words in 
Graph 


Command 


11 


41 


7 
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Yes-No-Q 


20 


12D 


17 


11 
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Wh-g 


20 
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11 


19 
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Table 1. N-best sentences versus a word graph. 

Even though a word graph is a compact representation for the 
4J of a speech recognition system, current systems do not 
11^ [^(]]! this type of representation. However, parsers that can 



process the graph representation will more efBciently process all 
sentence hypotheses. Tomita [12] has developed an LR parsing 
algorithm capable of processing a word graph, though it does not 
use the graph directly in the parsing algorithm. On the other 
hand, we have developed a natural language framework based 
on Maruyama's constraint dependency grammar (CDG) [5, 6, 7) 
which allows us to process a word graph of sentence hypotheses 
directly. 

In this paper, we will describe how a CDG parser performs 
the syntactic analysis of a single sentence and then describe the 
modifications required to process word graphs. 



2 Constraint Dependency Grammars 

To develop a syntactic analysis for a sentence using CDG, a cc n- 
straint network (CN) of word nodes is constructed. Associate i 
with each node is its position and a set of roleSy which indicate 
the various functions the word fills in a sentence. Though two 
roles are required to write a grammar at least as expressive as 
a CFG [5], our examples depict a single role to simplify the dis- 
cussion. The role described in the examples is the governor role, 
which represents the function a word fills given that it modifies 
its head. For example, given that the head of a noun phrase is 
a noun, the word the has the function of a determiner when it 
modifies the head noun. 

Each role is initially assigned all role values allowed by the 
word's lexical category, where a role value consists of a label (the 
function the word can serve, e.g., SUBJ) and a modifiee (the num- 
ber corresponding to the position of the word which it modifies, 
or nil). There are p*g*n = 0{n) possible role values (where p, the 
number of rcles per word, and the number of different labels, 
aire grammatical constants and n is the number of modifiees or 
words) for each of the n words in the sentence, giving O(n^) role 
values altogether and requiring O(n^) time to generate. Figure 1 
shows the initialization of the role values for the sentence A fish 
eats. 



WORD NODE 




■WORD 
POSITION 



ROLE 



ROLE VALUES 





(SUBJ-nll. SUBJ-1 . SUBJ-3) {ROOT^I. ROOT-1 . ROOT-2) 
Figure 1. The word nodes for A fish cats. 

Onro the word nodes are constructed, constraints are applied 
to the role values to eliminate the ungrammatical ones, A con- 
straint is an if-then rule which must be satisfied by the role values. 
First, unary constraints (i.e., constraints with a single variable) 



Me applied to each role value to eliminate ungrammatical role val- 
ue, from the roles. For example, the following unary constramt 
eliminates all of the role values for eats except for ROUi-nil. 

; ; A verb has the governor label of ROOT and 
;; a modifiee of nil. 
(if (eq (category x) verb) 
(and (eq (label x) ROOT) 

(eq (modifiea x) nil))) 

-To apply this constraint to the network in figure 1, each role 
value for every role is examined to ensure that it obeys the con- 
straint. A role value violates a constraint if and only if it causes 
the antecedent of the constraint to evaluate to TRUE and the 
consequent to evaluate to FALSE. A role value which violates 
a unary constraint is eliminated from its role. Because a unary 
constraint can be tested against one role value in constant time 
and there are O(n^) role values to check, the time to apply a sin- 
jrle unary constraint is O(n^). Initially, many unary constraints 
are applied to reduce the number of legal role valu^, requiring 
0(fc„*n') time, where ife„ represents the constant number of unary 
constraints. Additional unary constraints are applied to the net- 
work in figure 1 to eliminate the role values DET-ml, SUBJ-ml, 
and SUBJ-1 (e.g., a ST'BJ must modify a verb and a DET must 
modify a noun). 

Next, the CN is prepared for the propagation of binary con- 
straints, which contain two variables and determine which pairs 
of role values can legally coexist. To keep track of pairs of role 
values, arcs connect the roles associated with each node to all 
other roles in the network. Each of the arcs has associated with 
it an arc matrix, whose row and column indices are the role values 
associated with' the two roles. The elements of the arc matrices 
can hold either a 1 (indic-.ting that the two roie values which 
index it can legally coexist) or a 0 (indicating that either one or 
the other role value can exist, but not simultaneously). Initially, 
all entries in the matrices are set to 1, indicating that there is 
nothing about one word's function which prohibits another word s 
right to have a certain function in the sentence. After the arc ma- 
trices are constructed in 0{n*) time, the binary constr^nts are 
applied to the pairs of role values that represent the indices tor 
matrix entries. If a binary constraint fails for a pair of role values 
then they cannot coexist in the same sentence, which is indicated 
by setting the entry in the matrix to zero. Figure 2 shows the 
network after the propagation of the foUowing binary constraint: 

• ; A DET (determiner) is governed by a head noun 
;; with the label of SUEJ (subject), OBJ (direct 
';; object), lOBJ (indirect object), or PF.OBJ 
;; (object of a preposition), 
(if (and (equal (label x) DET) 

(equal (modifiee x) (position y))) 
(or (equal (label y) SUBJ) 

(equal (label y) OBJ) 

(equal (label y) lOBJ) 

(equal (label y) PP.OBJ))) 

Since it is applied to O(n^) pairs of role values, the time to prop- 
agate the constraint is 0{n*), and the time required to propagate 
iti, binary constraints is 0{kb * '*'')- 

Following the propagation of binary constraints, the network 
could still contain role VcJuss that would never be legal role values 
in a parse for the scnter.ce. The illegal role values can be elimi- 
nated by filtering the CN. In filtering, a role value is removed froni 
its role and from the row or column it indexes for each matrix 
associated with the arcs emanating from the role. For example, 
the role value DET-3 in figuie 2 can be eliminated from the role 
for the word a and the rows indexed by those values can also b^ 
eliminated from the matrices on the arcs emanating from that 
role, resulting in an unambiguous parse for the sentence. iHe re- 
maining role values form a parse graph for the sentence. A sing e 
O plication of filtering may be insufficient to eliminate illegal role 
lues since the elimination of a role value from one role could 



lead to the elimination of a role value from another ro e. Filtering 
continues until there are no role values 5nd«.ng matnx row, or 
columns containing only zeros, requiring 0{n ) time (.see [0\). 
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Figure 2. The CN for A fish eats after binary constraint 
propagation. 

A CDG parser has several advantages over traditional CFG 
parsers. The set of languages wWch can be expressed by CDG 
is a superset of the context-free languages (e.g., Maruyama 15, 6) 
constructed a CDG grammar capable of parsing ww, where w 
is an arbitrary string of terminal symbols). CDG also allows the 
addition of a contextual dimension by applying different sets ot 
constraints in different situations. This flexibility is an advantage 
of CDG over traditional CFG parsers, which use a single set ot 
rules to parse all sentences. CDG provides the grammar designer 
with the flexibil - • to create grammars for free-order languages 
like Latin or to add the order constraints necessary to parse lan- 
guages like English. Additionally, because CDG uses constraints 
instead of production rules, it is a simple matter to add exceptions 
v/ithout increasing the size of the grammar. 

On the other hand, CDG has a slower serial running time than a 
CFG parser (0(n<) compared to 0{n% where n is the aumber of 
words in a sentence). However, we have devised a parallehzation 
for the CDG parser [1] which uses O(n^) processors to parse in 
0(jt) time'for a CRCW P-RAM model (Common Read, Common 
Write Parallel Random Access Machine), "/here n is the number 
of words in the sentence and k, the namber of constramts, is 
a grammatical constant. Furthermore, this algorithm has been 
simulated on the MasPar MP-1, which uses the special features 
of the machine and 0{n*) processors to obtain an 0{k + log{n)) 
running time. CFG parsing algorithms have been par^ehzed 
[3]; however, to achieve sub-linear parse times has required D(n ) 
processors [8|. 

3 Serial CDG Parsing of Word 
Graphs 

In this section, we describe how to augment a word S^fP^ 
ate and parse a Spoken Language Constraint Network (SLCN). 
Figure 3 depicts an SLCN derived from a word graph constructed 
for the sentence hypotheses: A fish eat and Offices eats. By rep- 
resenting these hypotheses in a word graph, we are also able to 
process additional sentences (i.e., A fish eats md Office^ "0 not 
present in the list of hypotheses, but which could be the correct 
utterance. Each word node contains information on the beginmng 
and end point of the utterance, represented as an integer tuple ^D, 
e) The tuple is more expressive than the point scheme used tor 
CNs and requires modification of the Icss-thaa and greater-than 
predicates used in constraints. Notice that word nodes contain 
a list of all word candidates with the same beginning and end 
points, and edges }om word nodes that can be adjacent in a sen- 
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tence hyppthesis (see figure 3). 
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Figure 3, Example of a spoken language constraint net- 
work constructed from a word graph. 

To parse an SLCN^ each word candidate containea in a word 
node is assigned a set of role values for each role, requiring O(n^) 
time, where n is the number of word candidates in the graph. 
Unary constraints are applied to each of the role values in the 
network, as for CNs, requiring 0(^ * n^) t»me. The r reparation 
of the SLCN for the propagation of binary constraints is similar 
to that for a CN, except that two roles are joined with an arc 
iff they can be members of the same sentence hypothesis (i.e., 
they are connected by a path of directed edges). For example, 
there should be an arc betv/een the roles for a and fish in figure 
3, but not between the roles for a and offices. Additionally, any 
of the matrix entries indexed by role values pointing to words 
not contained in the sentence hypothesis supporting an arc axe 
automatically set to zero. In the SLCN of figure 3, the role values 
ROOT-(1,2) and ROOT-(2,3) cannot coexist with the role values 
for offices since they support another set of sentence hypotheses. 
The time required to prepare an SLCN for the propagation of 
binary constraints is 0(n'*). Like a CN, binary constraints are 
applied to pairs of role values in an SLCN, requiring 0{k\^ *n^) 
time. 

Filtering in an SLCN is complicated by the fact that the lim- 
itation of one word's function in one sentence hypothesis should 
not necessarily limit that word's function in another sentence. A 
role value cannot be removed from a role in an SLCN until it is 
eliminated from the matrices of all arcs incident to that role. This 
is n:iost easily illustrated by considering the arc matrix depicted 
in figure 3. Despite the fact that R00T-(1,2) and ROOT-(2,3) 
are not supported by the role values for offices^ neither should 
be ruled out until they are eliminated for the other sentence hy- 
potheses. The filtering algorithm for an SLCN requires the in- 
troduction of a new notation for specifying the conditions under 
which a role value should be eliminated. 

The following notational conventions are used to describe the 
algorithm. The capital letters A, B, X, and Y represent roles 
and the letter r represents a role valuf'. Also, arc_matrix(A,B) 
represents the arc matrix for the arc connecting A to B. The term 
connected-roles (B) is the set of all roles which are connected 
to B with arcs and ?.upported-roIe-values(arcjnatrix(A,B), 
B) is a function which returns the list of role value^i corresponding 
to the role B which are supported by the arc^atrix(A,B) (i.e., 
the indexed row or column contains at least one 1). 

Suppose role A is connected by arcs to roles B and X. If 
arc-matrix(A,B) does not support a role value r associated with 
role A (i.e., r ^ supported-role-values(arc_matrix(A,B), 
A)), then how can we determine whether arc«matrix(A,X) 
should continue to support r as a role value? The algorithm 
shouW eliminate r from arc.matr5x(A,X) iff X is a mem- 
mj j^"Ct;cry sentence that contains B; otherwise, it should 
l^ !^^ eliminated. Note that X is a member of every sea- - 



tence containing B in case 1 of figure 4, hence when r is elim- 
inated from arc-matrix(A,B), it should be ellmina^^^ed from 
arcjiiatrix(A,X) and from the role A, 

Cm*m 1: 2: Cm: Zi 





Figure 4. Filtering Cases for an SLCN. 

The conditions in which the role veJue should be maintained 
are depicted in cases 2 and 3 in figure 4 and are enumerated 
below: 

1. If X is not a member of a.ry of the same sentences as 
B as shown in case 2 of figure 4 (i.e., X ^ connectec'- 
roles(B)) then the role value r should remain supported by 
arc_matrix(A,X). 

2. Even if X is a member of some of the same sentences as 
B (i.e., X G connected-roIes(B)), the role r should not 
be eliminated from arcjmatrix(A,X) if X is contained in 
at least one other sentence not also containing B, as shown 
in case 3. Such a sentence exists only when there exists 
a role Y connected to roles A and X, but not to role 
B: 3Y (Y € connected-roles(A)) A (Y # B) A (Y € 
connected-roIes(X)) A (Y ^ connected-roIes(B)) A (r 
6 supported-role- values (arc-inatrix(A^), A)). 

Before filtering an SLCN, a preprocessing step is performed to 
set up equivalence classes of arcs incident to each role, requir- 
ing O(n^) time. If a role value is eliminable from one arc in an 
equivalence class, it is eliminable from all of them. Filtering of 
an SLCN, like a CN, requires that each role value be examined 
to determine whether it is disallowed by some arc matrix (i.e., 
the row or column indexed by the role value contains only Os). 
However, if a matrix disallows a role value, then instead of that 
role value being automatically eliminated from the role and all 
of the incident arc matrices, the equivalence classes are used to 
determine which of the arc matrices should eliminate that role 
value. The role value is eliminated from the role iff it is disal- 
lowed by all arcs incident to that role. Filtering continues until 
there are no more role values to eliminate, requiring O(n^) time. 

In an SLCN, if all of the role values in a role for a particu- 
lar word candidate are eliminated, then that word candidate is 
removed from the list of supported words. If all of the word can- 
didates for a word node axe eliminated, then the word node is also 
eliminated along with all of the arcs and edges attached to that 
node. Furthermore, word nodes which are no longer members of 
a legal sentence hypothesis (because there exists no path of edges 
between the beginning and end of the sentence going through that 
node) are also eliminated, requiring up to 0(n) time. 

The graphs created for the experiment in section 1 were con- 
verted to SLCNs and parsed using a grammar with three roles, 80 
unary constraints, and 190 binary constraints. More grammati- 
cal sentences were parsed in the SLCN than were available in the 
original sets of sentences; however, all of the additional parses 
had similar meanings to at least one of the original grammatical 
N-best sentences. 



Sentence 
Typ^ 


Number Grammatical 
Ss ip N-best 


Number Grammatical 
Ss in SLCN 


Command 


11 


15 


Yes-No-Q 


S 


12 


Wh-Q 


7 


16 



Table 2. N-best versus word graph sentence parses. 



Fs>T example, consider the SLCN depicted in figure 5 and pro- 
duced from the following N-best sentences: 
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1. Clear all windows. 

2. Clear windows. 

3. Clear all the windows. 

4. Get all windows. 

5. Give all windows. 

6. Cl-^ar all of the windows. 

7. Clear the windows. 

8. Get all the windows. 

9. Give all the windows. 

10. Get all of the windows - 

11. Give all of the windows. 

The SLCN in figure 5 contains three verbs: ckar, give, and get 
Each is the main verb in 5 minor parse variations contained in 
the SLCN- The X-windows interface to the parser allows the user 
to view the role values for each word candidate's roles. The word 
xuindoxus over the interval (3,5) has one role value for each of its 
three roles, which can be viewed by clicking on it in the word 
node. This interface also allows the user to view the matrices 
stored on each of the arcs in the network. 

The SLCN constraint parsing algorithm is rather slow, requir- 
ing O(n^) time to parse an SLCN with n word candidates. How- 
ever, using a CRCW P-RAM model, an SLCN can be parsed in 
0{k + 7i) time with O(n^) processors [2]. The extra n term is 
caused by the fact that when one word node is eliminated, any 
word nodes that are no longer members of a legal sentence hy- 
pothesis must also be eliminated from the network. If all of the 
word candidates were eliminated, this would require 0(n) time. 

We are currently implementing the SLCN parsing algorithm 
on the MasPar MP-l. Because of the power of the MasPar and 
the parallel nature of the algorithm, we will not have to sacri- 
fice the flexibility and expressivity of CDG grammars to improve 
performance when doing natural language parsing in a speech un- 
derstanding system. We are also developing prosodic and seman- 
tic constraints to demonstrate how easily additional knowledge 
sources can be incorporated into CDG parsers. 




Figure 5. The X-windows interface to the SLCN for the 
N-best commands. 
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